Definition of Lossless Compression
The so-called lossless compression format, as the name implies, is an audio format that compresses sound signals without loss. Common formats such as MP3 and WMA are lossy compression formats. Compared with the WAV file as the source, they have a considerable degree of signal loss, which is the fundamental reason why they can achieve a compression rate of 10%. The lossless compression format is like using compression software such as Zip or RAR to compress the audio signal, and the resulting compressed format is restored to a WAV file, which is exactly the same as the WAV file as the source! However, if you use Zip or RAR to compress the WAV file, you must decompress the compressed package before you can play it. The lossless compression format can be directly played in real time through the playback software, and it is exactly the same as lossy formats such as MP3. In short, the lossless compression format is a format that can reduce the size of the WAV file without sacrificing any audio signal.
Commonly used lossless compression algorithms include Shannon-Fano coding, Huffman coding, run-length coding, LZW (Lempel-Ziv-Welch) coding and arithmetic coding.
Huffman Coding
This method constructs the codeword with the shortest average length of different-character headers based entirely on the probability of character occurrence. It is sometimes called the best code, and is generally called Huffman coding. It is a coding method that can achieve the minimum average code length for statistically independent information sources. The coding efficiency is high.
Basic principle:
The code is constructed according to the probability of occurrence of the source characters. A shorter code length is given to the source characters with a higher probability of occurrence, and a longer code length is given to the source characters with a lower probability of occurrence, so that the average codeword of the encoding is the shortest.
Coding steps:
(1) Initialization: sort the symbols in descending order according to their probability.
(2) The two symbols with the smallest probability form a node.
(3) Repeat step 2.
(4) Starting from the root node to the "leaf" corresponding to each symbol, mark "0" (upper branch) or "1" (lower branch) from top to bottom. It does not matter which one is "1" and which one is "0". The final result is just that the assigned codes are different, while the average length of the codes is the same.
(5) Start from the root node and follow the branches to each leaf to write the code for each symbol.
Notes on Huffman coding:
Huffman coding has no error protection function. If there is an error in the code, it may cause a series of subsequent decoding errors.
Huffman coding is a variable-length coding, so it is difficult to arbitrarily search or call the file content.
Huffman relies on the statistical properties of the source. Each codeword of Huffman coding is an integer: therefore, in practice, the average code length is difficult to reach the size of information entropy.
Huffman encoding and decoding requires a code table. If the number of messages is large, the code table that needs to be stored will also be large, which will affect the system's storage capacity and encoding and decoding speed.
Arithmetic Coding
Arithmetic coding represents a source set as an interval between 0 and 1 on the real number line. Each element in this set is used to shorten this interval. The more elements in the source set, the smaller the interval is. When the interval becomes smaller, more digits are needed to represent the interval. This is the principle of interval as code. Arithmetic coding first assumes a probability model of the source, and then uses these probabilities to reduce the interval representing the source set.
Arithmetic coding uses two basic parameters :
The probability of a symbol and its coding interval
The probability of the source symbol determines the efficiency of the compression coding and also determines the intervals of the source symbols during the encoding process, and these intervals are between 0 and 1.
Several issues need to be noted in arithmetic coding:
1. Since the precision of actual computers cannot be infinite, overflow in calculations is an obvious problem, but most machines have 16-bit, 32-bit or 64-bit precision, so this problem can be solved using the scaling method.
2. Arithmetic The encoder generates only one codeword for the entire message, which is a real number in the interval [0, 1), so the decoder cannot decode before receiving all the bits representing this real number.
3. Arithmetic coding is also a coding method that is very sensitive to errors. If one bit is wrong, the entire message will be mistranslated.
Run-length encoding
RLE (Run-Length Encoding) is a compression scheme for data that contains multiple repetitions in a sequential order. The principle is to replace a series of repeated values with a single value plus a count value. The run length is the number of consecutive and repeated units. If you want to get the original data, just expand this encoding.
Comparing the number of codes before and after RLE encoding, we can find that before encoding, 73 codes are used to represent the data of this row, while after encoding, only 10 codes are used to represent the original 73 codes. The ratio of the amount of data before and after compression is about 771, that is, the compression ratio is 7:1. This shows that RLE is indeed a compression technology, and the encoding technology is practical.
The performance of RLE depends mainly on the characteristics of the image itself. RLE compression coding is particularly suitable for computer-generated images and is very effective in reducing the storage space of image files. However, since colorful natural images often have few consecutive pixels with the same color on the same line, and the number of consecutive lines with the same color value for several consecutive lines is even smaller, if the RLE encoding method is still used, not only will the image data not be compressed, but the original image data may become larger.
Decoding follows the same rules as encoding, and the data obtained after restoration is exactly the same as the data before compression.
Therefore, RLE is a lossless compression technology. It is used in BMP, JPEG/MPEG, TI FF and PDF encoding, and is also used in fax machines.
LZW encoding
LZW achieves compression by creating a string table and using shorter codes to represent longer strings. The correspondence between strings and codes is dynamically generated during the compression process and is implicit in the compressed data. It is restored according to the table during decompression. It is a lossless compression algorithm, the full name of which is Lempel-ziy- Welch encoding, or LZW for short.
Rationale
Extract different characters from the original text file data, create a compilation table based on these characters, and then use the index of the characters in the compilation table to replace the corresponding characters in the original text file data to reduce the size of the original data. It looks similar to the implementation principle of the palette image, but it should be noted that the compilation table here is not created in advance, but dynamically created according to the original file data. When decoding, the original compilation table must be restored from the encoded data.
The specific execution steps of the LZW encoding algorithm are as follows:
1. The dictionary at the beginning contains all possible roots, and the current prefix P is empty;
2. Current character (C): = the next character in the character stream;
3. Judgment level - whether the string P+C is in the dictionary
(1) If "yes": P: = P + C // (extend P with C)
(2) If “No
① Output the codeword representing the current prefix P to the codeword stream
②Add the affix-string P+C to the dictionary;
③Let P: = C/ (now P contains only one sub-symbol C)
4. Determine whether there are more codewords to be translated in the codeword stream
(1) If “yes”, return to step 2
(2) If “No”
① Output the code representing the current prefix P to the codeword stream
②End
Previous article:Simple AM radio circuit diagram based on CD4011
Next article:Difference between lossy and lossless compression
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- 【CODING TALK】How would you implement a queue?
- RVB2601 Development Board Quick Start Guide
- How to input xy coordinates for hole location in pads packaging
- Strange behavior of Firefox browser
- Detailed explanation of ADC function in C8051F020 microcontroller
- Q and D values of capacitors
- [Nucleo G071 Review] PWM breathing light
- [EETalk] Why do smart homes not use ZigBee but are keen on Bluetooth and Wi-Fi?
- Selection of the discharge resistor of the X capacitor
- Help on ADN8834 burning out