Microsoft's New AI Model Brings Images to Life with Talking Faces

Microsoft’s new AI model not only makes the lips match the audio but also shows emotions and other facial expressions.

Microsoft made a new AI called VASA -1. It can make really lifelike videos of people talking. With just one picture and a voice recording, it can make a video. Microsoft says the videos will have lips moving in sync with the voice, and the face and head moving naturally. They’re not planning to sell this as a product. They’ll use it to make virtual characters seem real. Microsoft knows the model could be misused.

On Microsoft’s Research announcement page, they explained how their new AI model works and what it can do. They said the VASA-1 model can make high-quality videos quickly and start right away. The company says the VASA-1 model can make videos in really clear quality, up to 512 x 512 pixels, and it can do it fast, up to 40 frames every second. Someone on Twitter shared a video of the AI model in action.

a single 4090
that's insane https://t.co/A73HrMewyP pic.twitter.com/fHjb2y1hQD
— Kaio Ken (@kaiokendev1) April 17, 2024

Credit: x.com (Twitter)

The VASA-1 system does more than just make lips match the audio. It also catches lots of different facial expressions, feelings, head movements, and even lets you control stuff like where the eyes are looking and how far away things seem. The video not only matches the lips perfectly with the sound but also shows lots of different facial expressions and natural head movements. This makes it seem really real and lively.

In simply, we can say VASA -1 gets its lifelike appearance by using AI to separate different parts of the face, like expressions, where the head is in 3D space, and how the lips move. This means you can change and edit each part separately.

Besides this, the AI could also make videos from fancy pictures, singing, and talking in other languages. Microsoft’s researchers noticed it learned to do this on its own, even though it wasn’t taught.

Deepfake videos are getting more popular, and this model might be used for bad stuff if it’s used by the wrong people. Microsoft is saying they’re working on making virtual AI avatars better at expressing feelings, but for good purposes. They don’t want to make fake stuff to trick people. But, just like other similar tools, it could still be used to pretend to be someone else.

Microsoft thinks this model could do good things, like helping with school, aiding people who have trouble talking, or just keeping people company when they need it. Right now, the model isn’t ready to use. The videos it makes still have some mistakes, and they’re not as real-looking as they could be, according to Microsoft.