12522 views|7 replies

222

Posts

2

Resources
The OP
 

Nand Flash Data Loss [Copy link]

The company has a product that cannot be turned on after being shipped to the user for half a month. It is determined that the problem is that the file system [key file is missing]. The factors that caused the loss are unknown, please guide. The device can enter uboot, load the kernel, and mount the file system, but when executing linuxrc , it prompts that it cannot be found. Linux exits the init process and the system is in a pseudo-death state. Check the file system through uboot, and the device only has the following folders======================
   688397480206                      xp
   688397480206                      dev
   963275387150                      mnt
   688397480206                      tmp
   688397480206                      sys
   2887420735758                      var
   1513031201038                      usr
   963275387150                      arch
   963275387150                      home
   688397480206                      proc
   19311375675662                      sbin
   963275387150                      uptarfile ======================== A normal device should have the following folder=======================
   688397480134                      xp
   12095830618310                      app
   23056587157702                      bin
   688397480134                      dev
   6529553002694                      etc
   20651405471942                      lib
   1513031200966                      mnt
   688397480134                      nfs
   997635125446                      opt
   688397480134                      srv
   688397480134                      tmp
   688397480134                      sys
   2887420735686                      var
   1787909107910                      usr
   963275387078                      arch
   1238153294022                      home
   688397480134                      proc
   19311375675590                      sbin
   688397480134                      root
   48447353030                      linuxrc
   963275387078                      uptarfile 1452901658822                      .ash_history
   688397480134                      media
   688397480134                      modules ========================= Missing system key folders ======================= app bin etc lib nfs opt srv root linuxrc .ash_history media modules ======================== There are logs in the system, and some information will be written to the Flash every time a key action is performed. Equipment survival track: ================== On April 8, a product passed the 3-day high and low temperature running test (-10 ~ 50 degrees Celsius) within the company and was shipped to the user on April 11. The user turned on the machine to check once, and it was normal. The user turned on the machine once on April 12, and it was normal. The device has no boot history for half a month On May 3, the user reported that the device could not start normally. ===================== Device partition============= mtd0: 00c00000 00020000 "reserve" uboot + main Kernel + backup Kernel 12M mtd1: 00080000 00020000 "reserve" 512K reserved mtd2: 00080000 00020000 "reserve" 512K reserved mtd3: 00200000 00020000 "bmp" 2M storage logo mtd4: 00080000 00020000 "reserve" 512K reserved mtd5: 04000000 00020000 "rootfs" 64M ubifs mtd6: 03080000 00020000 "opt" =================== What I can be sure of is: It is impossible to delete the following directories when the system is running normally. At most, some configuration files will be written to the app directory app bin etc lib nfs opt srv root linuxrc .ash_history media modules

This post is from stm32/stm8

Latest reply

linuxrc is a text configuration file. Add the output to reproduce the error and see which one caused it.  Details Published on 2018-5-15 08:14
 

6423

Posts

17

Resources
2
 
The machine has been shut down for half a month without use. During this period, there must have been no file system operations. However, it is possible that there was no problem before the last normal shutdown. It may just be that the problem did not appear. For example, a file was cached in the memory, resulting in no error. After the power was off, there was a problem when it was powered on again. Or, after the last power-on, the system startup-related files were damaged after normal startup, and the next power-on may fail. The following is the key point The above are just some guesses. When I saw that the data was lost after being left idle for half a month, the first thing that came to my mind was the data retention problem of the nand flash. This means that the bits of the nand will be reversed after being left for a period of time. If there are fewer inversions, the ECC inside the controller will correct the data. If there are more inversions than the error correction capability, there is no way to read the data correctly. The higher the temperature, the more obvious this problem is. This is a nand characteristic. SLC is the best, MLC is the second, and TLC is the worst. If you are not cost-sensitive, you can choose SLC. Data retention is a must-test item in SSD Udisk testing. Write data, bake at high temperature, and then verify the data. This is also a problem that must be dealt with in NAND management. If debugging is possible, you can check whether NAND read ECC error occurs when the boot fails. This may require attention to the NAND driver layer. If this is the problem, see if the system allows increasing the number of check bits.

This post is from stm32/stm8
 
Personal signaturetraining
 

6423

Posts

17

Resources
3
 
You can try the following methods to reproduce the problem. Work at normal temperature, write to the log file during operation, and leave it in an offline high-temperature environment after running for a period of time. If you want to reproduce the problem as soon as possible, increase the temperature and increase the high-temperature time, and then start the system in a normal temperature environment. If the problem still cannot be reproduced, repeat it several times.
This post is from stm32/stm8
 
Personal signaturetraining
 

222

Posts

2

Resources
4
 
Basically, we have determined a suspicious direction. Every time the product is shut down, it is a forced power off, and the integrity of the file system is not guaranteed. The one-button power on and off circuit of the company's products refers to the description of my previous post. Power on: Changan 2 seconds to power on Power off: Changan 2 seconds to shut down https://bbs.eeworld.com.cn/forum ... 0&page=1#pid2364221 The actual shutdown process: After the user presses for 2 seconds, sync() is executed to synchronize files. [The screen goes out to prompt the user to let go]. If the user does not let go for a long time, the device will never be powered off [reference link circuit]. If there are still files written after 2 seconds, and the user lets go of his finger when the file table is being rewritten, the file system will be incomplete, and the next startup phenomenon is unknown. Why I don't use reboot: If the user presses the button for two seconds and then reboots directly, the file system will be absolutely intact. At the same time, the screen will go out to remind the user to release the finger. However, if the user [does not let go], the device will re-enter uboot and restart. Users will complain about how difficult it is to shut down our device (a long-standing problem in the company's design)
This post is from stm32/stm8

Comments

Uh oh  Details Published on 2018-5-13 20:59
 
 
 

222

Posts

2

Resources
5
 
The file system uses UBIFS. Problems: A process repeatedly writes 2000 100K files to the device; A process executes sync from time to time; When the sync is not completed, the processor reset button is pressed; After repeating for less than 10 times, the file system becomes read-only and cannot be remounted using the command mount -o remount,rw /. Uboot checks that there are no bad blocks and reburns the file system to work normally

This post is from stm32/stm8
 
 
 

222

Posts

2

Resources
6
 
File system UBIFS Try to reproduce: 1 check to create 2000 200K files; 1 check with occasional SYNC. In the second check, press the processor reset button before SYNC ends; Repeat about 10 times and the file system becomes read-only. Even mount -o remount,rw / cannot solve the problem. Uboot cannot detect bad blocks. Reflashing the file system can restore.

This post is from stm32/stm8
 
 
 

6423

Posts

17

Resources
7
 
lzwml posted on 2018-5-12 12:34 Basically determined a suspicious direction, each shutdown of the product is a forced power off, and the integrity of the file system is not guaranteed. The one-button switch of the company's products...
Oh
This post is from stm32/stm8
 
Personal signaturetraining
 
 

4005

Posts

0

Resources
8
 
linuxrc is a text configuration file. Add the output to reproduce the error and see which one caused it.
This post is from stm32/stm8
 
 
 

Guess Your Favourite
Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list