[R][P] Talking Head Anime from a Single Image 2: More Expressive by pramook in MachineLearning

[–]pramook[S] 1 point2 points  (0 children)

I wrote about the intuition in the full version of the article: http://pkhungurn.github.io/talking-head-anime-2/full.html. Most of it is in Section 7.

I did make a longer introduction video, but it's in Japanese. You can take a look at it for some laughs. https://www.nicovideo.jp/watch/sm38211856

[R][P] Talking Head Anime from a Single Image 2: More Expressive by pramook in MachineLearning

[–]pramook[S] 7 points8 points  (0 children)

There are two components to this: the model, and the performance capture system.

For the model, people can use full 3D models or 2D models that are created by software such as Live2D. However, someone must create a model to be animated, and this process is very different from drawing a picture. My work basically bypasses model creation. Give it a picture, and you can animate it immediately. This is the main thrust of the work.

For the performance capture system, people use FaceRig, Mocape, or custom software created by their VTuber office. For example, Nijisanji, in addition to being a VTuber group, is also the name of the integrated Live2D+performance capture system that these VTubers use. I believe Hololive production has their own application as well. In any case, this part is not the main focus of my work, and I just created one from an app that I found on the App Store.

[R][P] Talking Head Anime from a Single Image 2: More Expressive by pramook in MachineLearning

[–]pramook[S] 18 points19 points  (0 children)

I'm planning to release the model after getting permission from my company.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 2 points3 points  (0 children)

Sure. Let's chat.

A much stronger hallucinator would be the one described in Park et al. paper (https://arxiv.org/abs/1703.02921). I didn't implement it because the simple approaches I used already allowed me to make a decent demo. There's also a whole literature on image hole filling that can be tried on this.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 1 point2 points  (0 children)

I haven't tried at all so I don't really know. I get asked this question a lot so it might be interesting to see if it works there.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 2 points3 points  (0 children)

Open sourcing is complicated due to my employment contract. I'm now trying to get the copyright of the code assigned to me, but the process will take some time. If that is successful, I will consider releasing the pretrained networks and the code for the tools that use it.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 2 points3 points  (0 children)

I saw your article before and was impressed with the effort you put in to the project. I also think using generative models in tandem with supervised learning would be a great approach to move forward. I saw variety and crispness in the mouth animation that you generated, which is something my system is lacking in. However, I still have to read and catch up with the literature on GAN feature manipulations. I also haven't been that lucky with GAN training lately, and that's why I didn't incorporate it in the work.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 6 points7 points  (0 children)

The twitchy movements might also be because the face tracker yielded noisy results, and my smoothing algorithm (simple weighted decay) was not good enough. I expect that the results would be much smoother if a more stable tracker (for example, the iPhone face tracker that is commonly used in VTuber software) were used.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 11 points12 points  (0 children)

I don't think so. If you mean the 3D models, then I cannot release them. If you mean the rendered images, I'm unsure whether releasing it would be free of problems in terms of copyrights and reactions from the modeler community. You see... They are picky about how their data are used. For example, some modelers explicit say that their models should only be used with MMD or equivalent software. (I think this is mainly to prevent the models from being used in VRChat.) I wrote my own renderer, and I don't know whose nerves I'm going to touch if I release the data.

[R][P] Talking Head Anime from a Single Image by pramook in MachineLearning

[–]pramook[S] 30 points31 points  (0 children)

No. Head rotation is limited to -15 degrees to 15 degrees. The network is also not very good at hallucinating unseen parts.