Meta’s Make-a-Video App Uses AI to Create Videos

Researchers at Meta have made a major advancement in the field of artificially-generated works of art with their new method, Make-A-Video, which is aptly named because it allows one to generate a video from nothing but text. There is a wide range of impressive and varied outcomes, all of which are just a tad creepy.

Text-to-video models already exist; they are the logical next step after text-to-image models like DALL-E, which generate stills from input text. The transition from a static to a dynamic image may be simple for the human mind, but it is extremely challenging to implement in a machine learning model.

According to the paper outlining Make-A-Video, “a model that has only seen text describing images is surprisingly effective at generating short videos,” so, in the end, Make-A-Video doesn’t really change the game all that much.

The AI makes use of the proven diffusion technique for image generation, which “denoises” raw visual data in reverse order to arrive at the desired reference point. A large amount of unlabeled video content was used for unsupervised training (i.e., the model examined the data without heavy guidance from humans).

It has learned from the first how to create a photorealistic image; from the second how video is composed of individual frames. Surprisingly, it can effectively combine them without any prior instruction on how they should be combined.

The researchers state that Make-A-Video “sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures, including spatial and temporal resolution, faithfulness to the text, and quality.”

Indeed, disagreement seems unlikely. The results of earlier text-to-video systems, which took a different approach, were lackluster but encouraging. Now Make-A-Video completely destroys them, producing results on par with those seen, say, 18 months ago in original DALL-E or other previous-generation systems.

It must be admitted, however, that they still have a strange quality. We shouldn’t have been promised photorealism or perfectly natural motion, but the end results are all a little bit nightmarish, isn’t it?

Just something terrible and dreamlike about them. The animation seems to be of a strange stop-motion quality. The corruption and artifacts make everything look and feel fuzzy and otherworldly, as if they were leaking. Nobody knows where one person ends and another begins or where one thing should end and another begin.

I don’t say any of this to sound like an artificial intelligence snob who demands nothing but the most photorealistic 4K images. It fascinates me how bizarre and unsettling all of these videos are despite their apparent realism. It’s incredible that they can be created so quickly and arbitrarily, and things will only improve from here. However, there is still an indefinable surreal quality present in even the most sophisticated image generators.

Just as image generators can be prompted by images, so can Make-A-Video transform still images and other videos into variants or extensions thereof. The findings are slightly less unsettling than expected. Compared to what came before, this is a huge improvement; the team deserves praise. Although it is not yet open to the public, you may sign up in order to be considered for future access in whatever form the developers ultimately decide upon. meta

I don’t say any of this to sound like an artificial intelligence snob who demands nothing but the most photorealistic 4K images. It fascinates me how bizarre and unsettling all of these videos are despite their apparent realism. It’s incredible that they can be created so quickly and arbitrarily, and things will only improve from here. However, there is still an indefinable surreal quality present in even the most sophisticated image generators.

Just as image generators can be prompted by images, so can Make-A-Video transform still images and other videos into variants or extensions thereof. The findings are slightly less unsettling than expected. Compared to what came before, this is a huge improvement; the team deserves praise. Although it is not yet open to the public, you may sign up in order to be considered for future access in whatever form the developers ultimately decide upon.