Stable Diffusion can also compress images: smaller than JPEG and clearer to the naked eye, but don’t try it on faces.
Alex from Ao Fei Si
Qubit | Public account QbitAI
The free and open source Stable Diffusion has been used in new ways:
This time it was used to compress the image .
Stable Diffusion can not only reduce the same original image to a smaller size, but its performance is also visibly better than JPEG and WebP .
For the same original image, the image compressed by Stable Diffusion not only has more details, but also has fewer compression artifacts.
But software engineer Matthias Bühlmann (let’s call him Brother MB), who uses Stable Diffusion to compress images , also pointed out that this method also has obvious limitations.
Because it is not very good at processing faces and text, sometimes it will even create features that do not exist in the original image after decoding and expanding back.
For example, like this (the effect can be shocking) :
△ The left is the original image, and the right is the generated image after Stable Diffusion compression and expansion
But, having said that——
How does Stable Diffusion compress images?
To explain clearly how Stable Diffusion compresses images, we might as well start with some important working principles of Stable Diffusion.
Stable Diffusion is a special diffusion model called Latent Diffusion .
Unlike standard diffusion , latent diffusion performs the diffusion process in a lower-dimensional latent space (Latent Space) without using the actual pixel space.
In other words, the representation results of the latent space are some compressed images with lower resolution, but these images have higher accuracy.
Here, the resolution and accuracy of an image are two different things. Resolution is a parameter that indicates the amount of data in an image, while accuracy is a quantity that reflects how close the result is to the true value.
Take this camel’s head shot as an example: the original image size is 768KB, the resolution is 512×512, and the precision is 3×8 bits.
After being compressed to 4.98KB using Stable Diffusion, the resolution is reduced to 64×64, while the accuracy is increased to 4×32 bits.
So it seems that there is not much difference between the compressed image of Stable Diffusion and the original image.
To be more specific, the potential diffusion model of Stable Diffusion has three main components :
VAE (Variational Auto Encoder, variational autoencoder) , U-Net , and text-encoder (Text-encoder) .
But in this test of compressing images, the text encoder is useless.
The main role is played by VAE, which consists of two parts: an encoder and a decoder.
Therefore, VAE can encode and decode a picture from the image space to obtain some latent space representation (Latent space representation) .
MB found that the decoding function of VAE is very stable for quantized potential representation.
By scaling, skewing, and remapping, quantizing the latent representation from floating point to 8-bit unsigned integers, we can get a less lossy compressed image:
First, the latents are quantized into 8-bit unsigned integers. At this time, the image size is 64×64×4×8Bit=16 kB (the original image size is 512×512×3×8Bit=768 kB) .
Then, the palette and dithering are applied to further reduce the data to 5kB, while also improving the image restoration.
As a rigorous programmer, MB not only observed with the naked eye, but also conducted data analysis on the image quality.
However, judging from the two important indicators of image quality evaluation, PSNR (peak signal to noise ratio) and SSIM (structural similarity) , the compression results of Stable Diffusion are not much better than JPG and WebP.
In addition, when the latent representation is re-decoded to extend to the original image resolution, although the main features of the image are still visible, VAE will also assign high-resolution features to these pixel values.
In the vernacular, the reconstructed image is often different from the original image, and contains many newly generated "ghost" features.
Let's review this picture again:
Although there are still many problems in using Stable Diffusion to compress images, in the words of Brother MB, the effect is still amazing and has great development prospects.
Now MB brother has put the relevant code on Google Colab. Friends who are interested can take a closer look~
Portal
:
https://colab.research.google.com/drive/1Ci1VYHuFJK5eOX9TB0Mq4NsqkeDrMaaH?usp=sharing
Reference links:
[1] https://arstechnica.com/information-technology/2022/09/better-than-jpeg-researcher-discovers-that-stable-diffusion-can-compress-images/
[2] https://matthias-buehlmann.medium.com/stable-diffusion-based-image-compresssion-6f1f0a399202
[3] https://huggingface.co/blog/stable_diffusion
-over-
"2022 Artificial Intelligence Annual Selection" is now open for registration
Now, the Quantum位 "2022 Artificial Intelligence Annual Selection" has officially begun . The selection will set up 5 categories of awards from three dimensions: enterprise, person, and product/solution.
For more information about the selection criteria and the list registration, please scan the QR code below~
Click here to follow me