Google surpasses itself with new AI: allowing Imagen to specify generated objects, and the style can be converted at will
Yi Ge comes from Ao Fei Temple
Qubit | Official account QbitAI
How powerful will it become if you add the ability to "hit wherever you point" to Imagen?
Just upload 3-5 photos of the designated object , and then use text to describe the background, actions or expressions you want to generate, and the designated objects will "flash" into the scene you want, and the actions and expressions will be lifelike.
Not only animals, but also other objects such as sunglasses, schoolbags, and vases can be made to look almost real:
It's the kind of person who won't let others see any flaws even if you post it on Moments. (Manual dog head)
This magical text-image generation model is called DreamBooth. It is Google's latest research result. It was adjusted based on Imagen and caused heated discussion on Twitter as soon as it was released.
Some netizens joked: This is simply the most advanced meme generator.
Relevant research papers have been uploaded to arXiv.
You can "travel around the world" with just a few photos
Before introducing the principles, let's take a look at the various capabilities of DreamBooth, including changing scenery, specifying actions, expressions and costumes, changing styles, etc.
If you are a "shit collector", with this model's " scene-changing ability ", you can take your dog out of the house without leaving the house, to the Palace of Versailles, at the foot of Mount Fuji... it's all a breeze.
△ The lighting is also more natural
Not only that, the pet's movements and expressions can also be specified at will. It is true that the details of the "one sentence P picture" are in place.
In addition to the "basic operations" mentioned above, DreamBooth can even change various photo styles, which is the so-called "adding filters".
For example, the various "world famous paintings" styles and various perspectives of dogs are simply not too artistic:
As for decorating them ? It's a piece of cake to use all kinds of cosplay props.
In addition, whether changing the color:
It’s even more magical. Even if you change species, this AI can do it.
So, what is the principle behind this interesting effect?
Add a "special identifier" to the input
The researchers made a comparison. Compared with other large-scale text-image models such as DALL-E2, Imagen, etc., only the DreamBooth method can faithfully restore the input image.
As shown in the picture below, input three small alarm clocks with a yellow "3" painted on the dial on the right. The images generated by DreamBooth perfectly retain all the details of the clocks, but the clocks generated by DALL-E2 and Imagen several times are different from the original ones. The bells are "a little bit different."
△ Li Kui and "Li Gui"
And this is also the biggest feature of DreamBooth - personalized expression .
Users can be given 3-5 random pictures of an object and get novel reproductions of the object in different backgrounds while retaining its key features.
Of course, the author also stated that this method is not limited to a certain model. If DALL·E2 is adjusted, it can also achieve such a function.
Specific to the method, DreamBooth adopts the method of adding " special identifiers " to objects.
In other words, the original instructions received by the image generation model were only one type of object, such as [cat], [dog], etc., but now DreamBooth will add a special identifier in front of such objects, becoming [V][object category].
The following figure is an example. Three dog photos uploaded by users and the corresponding class names (such as "dog") are used as input information to obtain a fine-tuned text-image diffusion model.
The diffusion model uses "a [V] dog" to specifically refer to the dog in the picture uploaded by the user, and then brings it into the text description to generate a specific image, where [V] is the special identifier.
As for why not just use [V] to refer to the entire [specific object]?
The authors stated that due to the limited number of input photos, the model cannot learn the overall characteristics of objects in the photos well, and may even overfit.
Therefore, the idea of fine-tuning is adopted here. Overall, it is still based on the [object category] features that AI has learned, and then uses the special features learned by [V] to modify it.
Taking the generation of a white dog as an example, the model here uses [V] to learn the dog's color (white), body shape and other personalized details, plus the common characteristics of dogs learned by the model in the large category [dog], You can generate more photos of white dogs that are reasonable and individual.
To train this fine-tuned text-image diffusion model, the researchers first generated low-resolution images based on a given text description, and the image of the dog in the generated image was randomized.
Then a super-resolution diffusion model is applied to replace the random image with the specific dog uploaded by the user.
research team
The DreamBooth research team comes from Google, and the first author is Nataniel Ruiz.
Nataniel Ruiz is a fourth-year PhD student in the Image and Video Computing Group at Boston University and is currently interning at Google. His main research interests are generative models, image translation, adversarial attacks, and facial analysis and impersonation.
The link to the paper is attached at the end of the article. If you are interested, please take a look~
Paper address:
https://arxiv.org/abs/2208.12242
Reference links:
[1]
https://dreambooth.github.io/
[2]
https://twitter.com/natanielruizg/status/1563166568195821569
[3]
https://natanielruiz.github.io/
-over-
"Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!
Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and discuss with AI practitioners, and not miss the latest industry development & technological progress.
PS. When adding friends, please be sure to note your name-company-position~
click here
Featured Posts