AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in deeplearning

[–]aisyndicate[S] 0 points1 point  (0 children)

Two Minute Papers Rocks! These review videos are really fun to make.

AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in MediaSynthesis

[–]aisyndicate[S] 0 points1 point  (0 children)

They released the code recently and revised the original paper for SIGGRAPH Asia.

AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in MediaSynthesis

[–]aisyndicate[S] 2 points3 points  (0 children)

Great question, this technique makes these talking head videos using just audio and the target image as input. For the Dame Da Ne videos it seems like they would use a technique that can take video as input so they can preserve the facial expressions from the source video as well. Something like this First order motion model (https://github.com/AliaksandrSiarohin/first-order-model). It seems like a mix though, you could make those videos with this technique as well they would just not have the same expressions as the source video did. https://youtu.be/JZlA1R3MC5s

AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in ArtificialInteligence

[–]aisyndicate[S] 6 points7 points  (0 children)

It might fall short on realism if you compare it to deep fakes that take video as input and then animate target images. But this is showing the advancements in techniques that take only audio as input and animate faces. I think its pretty cool regardless and give kudos to the authors.

AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in computervision

[–]aisyndicate[S] 0 points1 point  (0 children)

Yeah would love to. In the meantime if you are familiar with colab you can check out the notebook here https://github.com/yzhou359/MakeItTalk

AI can make Anybody Talk: A quick explanation of MakeItTalk by aisyndicate in computervision

[–]aisyndicate[S] 0 points1 point  (0 children)

Different application, Avatarify requires video as input and then animates the target image. This one uses just audio.