Article count:10400 Read by:146798529

Featured Content
Account Entry

This video went viral on the Internet. Google made AI video fraud too easy.

Latest update time:2023-01-16
    Reads:
Jin Lei Pine from Ao Fei Si
Qubit | Public account QbitAI

Dear friends, the matter of AI making videos has been pushed to the forefront of public opinion again today.

The reason was that someone posted a video of such a little penguin online:

Does it match the picture you imagined?

Generally speaking, this AI has achieved a seamless transition even when faced with such imaginative scene prompts.

No wonder netizens exclaimed after watching this video, " (Technology) is developing so fast."

For shorter prompt words, Phenaki is even better.

For example, feed Phenaki this text:

A realistic teddy bear is diving; then it slowly surfaces and walks onto the beach; then the camera zooms out to show the teddy bear walking by a bonfire on the beach.

Not enough? Let’s do another paragraph, this time with a different protagonist:

On Mars, the astronaut walked through a puddle, and his silhouette was reflected in the water; he danced next to the water; then the astronaut started walking his dog; and finally he and the dog watched Mars and fireworks together.

When Google released Phenaki earlier, it also demonstrated the ability to generate a video by inputting an initial frame and a prompt word into Phenaki.

For example, given a static image like this:

Then give it Phenaki a simple "feeding" sentence: the white cat touches the camera with its paw. The effect comes out:

Still based on this picture, change the prompt word to "A white cat yawns", and the effect will be like this:

Of course, you can also switch the overall style of the video at will:

Netizen: Will the video industry be impacted by AI?

But in addition to Phenaki, Google also released Imagen Video at that time, which can generate high-definition video clips with 1280*768 resolution and 24 frames per second.

It is based on the image generation SOTA model Imagen and demonstrates three special capabilities:

  • Can understand and generate works of different artistic styles, such as watercolor, pixel and even Van Gogh style

  • Able to understand the 3D structure of objects

  • Inherited Imagen's ability to accurately depict text

Earlier, Meta also released Make-A-Video, which can not only convert videos through text, but also generate videos based on images, such as:

  • Convert still images to video

  • Frame insertion: Generate a video based on two pictures before and after

  • Generate a new video based on the original video
    ...

Some people are worried about the sudden emergence of generative video models:

Of course, some people think that the time has not yet come:

0-1 will always be fast, but 1-100 will still be long.

However, some netizens are already looking forward to relying on AI to win Oscars:

How long will it take for AI to become the new video editor, or win an Oscar?

Principle introduction

Going back to Phenaki, many netizens are curious about how it generates such smooth videos through text?

Simply put, compared to previous generative video models, Phenaki pays more attention to the arbitrary length of time and coherence .

Phenaki's ability to generate videos of arbitrary lengths of time is largely due to a new encoder-decoder architecture: C-ViViT .

It is a causal variant of ViViT capable of compressing videos into discrete embeddings.

You must know that in the past, when obtaining video compression, either the encoder could not compress the video in time, resulting in the final generated video being too short, such as VQ-GAN, or the encoder only supported a fixed video length, and the length of the final generated video could not be adjusted arbitrarily, such as VideoVQVAE.

But C-ViViT is different. It can be said to take into account the advantages of the above two architectures. It can compress videos in the time and space dimensions, and while maintaining autoregression in time, it can also autoregressively generate videos of any length. .

C-ViViT can enable the model to generate videos of any length, but how to ensure the logic of the final video?

This relies on another important part of Phenaki: the two-way Transformer.

In order to save time, the sampling steps are fixed, and different video tokens can be predicted simultaneously during the processing of text prompts.

In this way, combined with the aforementioned, C-ViViT can compress video in the time and space dimensions, and the compressed token has time logic.

In other words, the Transformer that has been masked and trained on these tokens also has temporal logic, and the coherence of the final generated video is naturally guaranteed.

If you want to know more about Phenaki, you can click here to view it.

Phenaki:
https://phenaki.github.io

Reference links:
[1]
https://phenaki.video/
[2] https://phenaki.research.google/
[3] https://twitter.com/AiBreakfast/status/1614647018554822658
[4] https:// twitter.com/EvanKirstel/status/1614676882758275072

-over-

"Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!

Friends who are interested in artificial intelligence and smart cars are welcome to join the exchange group to communicate and discuss with AI practitioners, and not miss the latest industry development & technological progress.

PS. When adding friends, please be sure to note your name-company-position~


click here


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号